Distributed General Matrix Multiply and Add for a 2D Mesh Processor Network

نویسندگان

  • Bo Kågström
  • Mikael Rännar
چکیده

A distributed algorithm with the same functionality as the single-processor level 3 BLAS operation GEMM, i.e., general matrix multiply and add, is presented. With the same functionality we mean the ability to perform GEMM operations on arbitrary subarrays of the matrices involved. The logical network is a 2D square mesh with torus connec-tivity. The matrices involved are distributed with non-scattered blocked data distribution. The algorithm consists of two main parts, alignment and data movement of subarrays involved in the operation and a distributed blocked matrix multiplication algorithm on (sub)matrices using only a square submesh. Our general approach makes it possible to perform GEMM operations on non-overlapping submeshes simultaneously.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Row/Column-First: A Path-based Multicast Algorithm for 2D Mesh-based Network on Chips

In this paper, we propose a new path-based multicast algorithm that is called Row/Column-First algorithm. The proposed algorithm constructs a set of multicast paths to deliver a multicast message to all multicast destination nodes. The set of multicast paths are all of row-first or column-first subcategories to maximize the multicast performance. The selection of row-first or column-first appro...

متن کامل

Performance Evaluation of Diffusion Method for load balancing in Distributed Environment

In this paper, we study about diffusion load balancing algorithms with their implementation. Our analysis is based on different topologies in simulated environment. The purpose of load balancing algorithm is to distribute the excess load of processor to lightly loaded processor. The objective of this analysis is to find out best stable network amongst chain, 2D and 3D Mesh networks in this diff...

متن کامل

Processor Tagged Descriptors: A Data Structure for Compiling for Distributed-Memory Multicomputers

The computation partitioning, communication analysis, and optimization phases performed during compilation for distributed-memory multicomputers require an eecient way of describing distributed sets of iterations and regions of data. Processor Tagged Descriptors (PTDs) provide these capabilities through a single set representation parameterized by the processor location for each dimension of a ...

متن کامل

Two-Dimensional Boundary-Conforming Orthogonal Grids for External and Internal Flows Using Schwarz-Christoffel Transformation

In this paper, a Schwarz-Christoffel method for generating two-dimensional grids for a variety of complex internal and external flow configurations based on the numerical integration procedure of the Schwarz-Christoffel transformation has been developed by using Mathematica, which is a general purpose symbolic-numerical-graphical mathematics software. This method is highly accurate (fifth order...

متن کامل

CAFT: Cost-aware and Fault-tolerant routing algorithm in 2D mesh Network-on-Chip

By increasing, the complexity of chips and the need to integrating more components into a chip has made network –on- chip known as an important infrastructure for network communications on the system, and is a good alternative to traditional ways and using the bus. By increasing the density of chips, the possibility of failure in the chip network increases and providing correction and fault tol...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995